**Note:** There are many ways to solve both Q1, b part and Q2. Only one way has been shown in the rubric.

## Set A

Ans 1.

- RAW R2 for instructions SUB and MUL, R5 for instructions MUL and DIV WAR - R5 for instructions SUB and MUL, R2 for instructions ADD and SUB WAW - R1 for instructions ADD and DIV
- b. ADD p1, p2, p3SUB p9, p4, p5MUL p10, p9, p3DIV p11, p4, p10

1 mark deducted if architectural registers are used in renaming.

1 mark deducted if RAW hazard removed

#### Ans 2.

a.

| Clock Cycle | Integer Unit     | Floating point   | Load/Store<br>Unit |
|-------------|------------------|------------------|--------------------|
| 1           |                  |                  | lw F1, 0(R1)       |
| 2           | addiu R1, R1, 4  |                  |                    |
| 3           |                  |                  |                    |
| 4           |                  |                  |                    |
| 5           |                  | add.s F2, F3, F1 |                    |
| 6           |                  |                  |                    |
| 7           | addiu R2, R2, 4  |                  | sw F2, 0(R2)       |
| 8           | addiu R3, R3,1   |                  |                    |
| 9           | bne R3, R4, loop |                  |                    |

FLOPS = 1/9 = 0.11

1 Mark deducted if the scheduling takes more than 14 clock cycles

b. lw F1, 0(R1) lw F2, 4(R1) lw F3, 8(R1) addiu R1, R1, 12 add.s F4, R 3, F1 addiu R5, R3,1 add.s F5, R5, F2 addiu R6, R3,2 add.s F6, R6, F3

sw F4, 0(R2)

sw F5, 4(R2) sw F6, 8(R2) addiu R2, R2, 12 addiu R3, R3, 3 bne R3, R4, loop

- 1 mark deducted for incorrect loop increment
- 1 mark deducted for incorrect array increment
- 1 mark deducted if scalar value not incremented while unrolling
- 1 mark deducted if continuous array location value not taken (should not access the same memory location)

C.

| Clock Cycle | Integer Unit     | Floating point   | Load/Store Unit |
|-------------|------------------|------------------|-----------------|
| 1           |                  |                  | lw F1, 0(R1)    |
| 2           |                  |                  | lw F2, 4(R1)    |
| 3           |                  |                  | lw F3, 8(R1)    |
| 4           | addiu R1, R1,12  |                  |                 |
| 5           | addiu R5, R3,1   | add.s F4, R3, F1 |                 |
| 6           | addiu R6, R3,2   | add.s F5, R5, F2 |                 |
| 7           |                  | add.s F6, R6, F3 | sw F4, 0(R2)    |
| 8           |                  |                  | sw F5, 4(R2)    |
| 9           |                  |                  | sw F6, 8(R2)    |
| 10          | addiu R3, R3,3   |                  |                 |
| 11          | addiu R2, R2,12  |                  |                 |
| 12          | bne R3, R4, loop |                  |                 |

FLOPS = 3/12 = 0.25

d. Keyword - More parallelism

# Set B

#### Ans 1.

- RAW R5 for instructions SUB2 and MUL
   WAR R1 for instructions SUB2 and MUL
   WAW R7 for instructions ADD and SUB1
- b. SUB x7, x2, x3 SUB x5, x4, x1 MUL x9, x5, x3 ADD x10, x4, x6

1 mark deducted if architectural registers are used in renaming.

1 mark deducted if RAW hazard removed

## Ans 2.

a.

| Clock Cycle | Integer Unit    | Floating point   | Load/Store Unit |
|-------------|-----------------|------------------|-----------------|
| 1           |                 |                  | lw F1, 0(R1)    |
| 2           | addiu R1, R1, 4 |                  |                 |
| 3           |                 |                  |                 |
| 4           |                 |                  |                 |
| 5           |                 | add.s F2, F3, F1 |                 |
| 6           |                 |                  |                 |
| 7           | addiu R2, R2, 4 |                  | sw F2, 0(R2)    |
| 8           | addiu R3, R3,2  |                  |                 |
| 9           |                 |                  |                 |

FLOPS = 1/9 = 0.11

1 Mark deducted if the scheduling takes more than 14 clock cycles

```
b. Iw F1, 0(R1)
Iw F2, 4(R1)
Iw F3, 8(R1)
addiu R1, R1, 12/24
add.s F4, R3, F1
addiu R5, R3,2
add.s F5, R5, F2
addiu R6, R3,4
add.s F6, R6, F3
sw F4, 0(R2)
sw F5, 4(R2)
sw F6, 8(R2)
addiu R2, R2, 12/24
```

addiu R3, R3, 6 bne R3, R4, loop

- 1 mark deducted for incorrect loop increment
- 1 mark deducted for incorrect array increment
- 1 mark deducted if scalar value not incremented while unrolling
- 1 mark deducted if continuous array location value not taken (should not access the same memory location)

C.

| Clock Cycle | Integer Unit       | Floating point   | Load/Store Unit |
|-------------|--------------------|------------------|-----------------|
| 1           |                    |                  | lw F1, 0(R1)    |
| 2           |                    |                  | lw F2, 4(R1)    |
| 3           |                    |                  | lw F3, 8(R1)    |
| 4           | addiu R1, R1,12/24 |                  |                 |
| 5           | addiu R5,R3,2      | add.s F4, R3, F1 |                 |
| 6           | addiu R6,R3,4      | add.s F5, R5, F2 |                 |
| 7           |                    | add.s F6, R6, F3 | sw F4, 0(R2)    |
| 8           |                    |                  | sw F5, 4(R2)    |
| 9           |                    |                  | sw F6, 8(R2)    |
| 10          | addiu R3, R3,6     |                  |                 |
| 11          | addiu R2, R2,12/24 |                  |                 |
| 12          | bne R3, R4, loop   |                  |                 |

FLOPS = 3/12 = 0.25

d. Keyword - More parallelism

## Set C

## Ans 1.

- a. RAW R4 for instructions SUB and DIV, R4 for instructions SUB and ADD2
   WAR R1 for instructions SUB and DIV
   WAW R7 for instructions ADD1 and ADD2
- b. ADD e7, e2, e3SUB e4, e4, e1DIV e9, e4, e3ADD e10, e4, e6

1 mark deducted if architectural registers are used in renaming.

1 mark deducted if RAW hazard removed

## Ans 2.

a.

| Clock Cycle | Integer Unit        | Floating point   | Load/Store Unit |
|-------------|---------------------|------------------|-----------------|
| 1           |                     |                  | lw F1, 0(R1)    |
| 2           | addiu R1, R1, 4     |                  |                 |
| 3           |                     |                  |                 |
| 4           |                     |                  |                 |
| 5           |                     | add.s F2, F3, F1 |                 |
| 6           |                     |                  |                 |
| 7           | addiu R2, R2, 4     |                  | sw F2, 0(R2)    |
| 8           | addiu R3, R3,3      |                  |                 |
| 9           | bne R3, R4,<br>loop |                  |                 |

FLOPS = 1/9 = 0.11

1 Mark deducted if the scheduling takes more than 14 clock cycles

```
b. lw F1, 0(R1)
lw F2, 4(R1)
lw F3, 8(R1)
addiu R1, R1, 12/36
add.s F4, R3, F1
addiu R5, R3,3
add.s F5, R5, F2
addiu R6, R3,6
add.s F6, R6, F3
sw F4, 0(R2)
sw F5, 4(R2)
sw F6, 8(R2)
```

addiu R2, R2, 12/36 addiu R3, R3, 9 bne R3, R4, loop

- 1 mark deducted for incorrect loop increment
- 1 mark deducted for incorrect array increment
- 1 mark deducted if scalar value not incremented while unrolling
- 1 mark deducted if continuous array location value not taken (should not access the same memory location)

C.

| Clock Cycle | Integer Unit       | Floating point   | Load/Store Unit |
|-------------|--------------------|------------------|-----------------|
| 1           |                    |                  | lw F1, 0(R1)    |
| 2           |                    |                  | lw F2, 4(R1)    |
| 3           |                    |                  | lw F3, 8(R1)    |
| 4           | addiu R1, R1,12/36 |                  |                 |
| 5           | addiu R5,R3,3      | add.s F4, R3, F1 |                 |
| 6           | addiu R6,R3,6      | add.s F5, R5, F2 |                 |
| 7           |                    | add.s F6, R6, F3 | sw F4, 0(R2)    |
| 8           |                    |                  | sw F5, 4(R2)    |
| 9           |                    |                  | sw F6, 8(R2)    |
| 10          | addiu R3, R3,9     |                  |                 |
| 11          | addiu R2, R2,12/36 |                  |                 |
| 12          | bne R3, R4, loop   |                  |                 |

FLOPS = 3/12 = 0.25

d. Keyword - More parallelism